A Histogram Utilizing the Cluster Information

نویسندگان

  • John Roh
  • Hyun Kyu Park
  • Kyoung Wook Min
  • Myoung-Ho Kim
چکیده

Histograms are summary structures of large datasets, which are mainly used for selectivity estimation during query optimization. Selectivity estimation is the fast approximation of query result size. In this paper, we focus on multi-dimensional histograms, especially bidimensional histograms. At the time of selectivity estimation, buckets partially overlapping with a query return approximated results assuming that all objects within them are uniformly distributed. Since, however, the objects within the region of a query are not likely to be uniformly distributed, skews (or clusters) in buckets commonly degrades the accuracy of a histogram. Our aim is to utilize clusters in buckets to enhance the accuracy of selectivity estimation. We propose a new method that associates cluster information with a bucket. We present new schemes which define clusters formally and algorithms which find such clusters efficiently as well. We show through experiments that our proposed method provides better performance than other existing wellknown methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two-dimensional histogram equalization and contrast enhancement

 Proposed method – Two dimensional histogram equalization(2DHE) • Utilizing contextual information to enhance contrast • Based on contrast observation in image − Improved by increasing gray-level • Including global histogram algorithm − Special case of 2DHE • Automatic parameter selection algorithm contained

متن کامل

Wasserstein k-means++ for Cloud Regime Histogram Clustering

Much work has sought to discern the different types of cloud regimes, typically via Euclidean k-means clustering of histograms. However, these methods ignore the underlying similarity structure of cloud types. Wasserstein k-means clustering is a promising candidate for utilizing this structure during clustering, but existing algorithms do not scale well and lack the quality guarantees of the Eu...

متن کامل

Local vs. Global Histogram-Based Color Image Clustering

In this paper, we present two image clustering techniques to automatically group color images that correlate with semantic concepts. This work goes towards satisfying the ever growing need for techniques that are capable of automatically generating semantic concepts for images from their visual features. We present two techniques and evaluate their relative performances based on the perceptual ...

متن کامل

Adding spatial distribution clue to aggregated vector in image retrieval

This study proposes a novel algorithm that enhances the distinctiveness of the traditional vector of locally aggregated descriptors (VLAD) using spatial distribution clue of local features. The algorithm introduces a new method to compute the spatial distribution entropy (SDE) of clusters. Unlike conventional methods, this algorithm considers the distribution of full spatial information provide...

متن کامل

Iris Feature Extraction and Recognition using Unbalanced Haar Wavelets & Modified Multi Texton Histogram

Colored disk in the eye, the iris, attracted biometric Technologies to create potential and robust identification and verification systems designed for human identification in a no. of applications. Many techniques have been developed for iris recognition so far. Here, a new iris recognition system utilizing unbalanced wavelet coefficients and modified multi texton histogram feature coefficient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005